Skip to content

Conversation

@philliphoff
Copy link
Member

@philliphoff philliphoff commented Jun 12, 2025

NOTE: !!! Do not merge until the associated protobuf PR is committed and this PR has been updated to match. !!!

Ports Durable Task Framework (DTFx) tracing behavior to the Durable Task .NET SDK gRPC layer to allow end-to-end tracing of orchestrations from creation to completion, including:

  • Client-side
    • Creating orchestrations
    • Raising events
  • Worker-side
    • Executing orchestrations (where all activity is parented to a single "logical" span)
    • Creating suborchestrations
    • Creating timers
    • Sending events

With this change, users can get a complete picture of orchestration execution including how their own traces dovetail with Durable Task execution, as shown in the following example, which follows creation of an orchestration starting from a web API, to execution by a DT worker, with suborchestrations, and back to the web API as part of task execution.

Screenshot 2025-06-12 at 16 36 03

This change depends on corresponding changes to the Durable Task protobuf definitions. It also assumes any backend properly accepts and returns the parent trace contexts given it by the SDK.

Resolves #439

@philliphoff philliphoff requested a review from cgillum July 14, 2025 19:32
@cgillum cgillum requested review from bachuv and sophiatev July 14, 2025 19:43
Copy link
Contributor

@sophiatev sophiatev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: do we plan to do the same thing for entities at some point?

completionToken: string.Empty, /* doesn't apply */
entityConversionState: null);
entityConversionState: null,
// TODO: Should this activity be created?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might lead to redundancy in the case that this is being used by Durable Functions (since DT.Core will also create the trace activity in that case, I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was primarily targeting these changes at DT used separately from DF. I will see if I can test the two together. I would expect that DF uses a differently-named activity source, though, so overlap should be ok. I wouldn't want to not trace something in DT because it's being traced in DF.

InstanceId = subOrchestrationAction.InstanceId,
Name = subOrchestrationAction.Name,
Version = subOrchestrationAction.Version,
ParentTraceContext = clientActivityContext is not null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having to creating a whole TraceContext object and having to create these strings manually, can you just save the span ID? Since it looks like that is the only information TraceHelper.CreateTraceActivityForSchedulingSubOrchestration and TraceHelper.StartTraceActivityForSchedulingTask actually use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I misunderstand, while it's true that TraceHelper only use a subset of TraceContext, the protos for those history events themselves offer a parent TraceContext so this is necessary to satisfy that original contract (e.g. across all platforms). Currently, for example, DTS is passing the original parent context at orchestration creation to those events, which is be better than nothing, but results in gaps in the trace. Allowing for the context to round trip gives the SDKs flexibility in adding layers to the trace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I did do some cleanup to simplify/consolidate creation of these context objects.

Copy link
Contributor

@sophiatev sophiatev Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see what you mean, it just makes me quite nervous that we're having to build these manually, and that we're altering the parent trace context to now hold an injected client span id rather than the parent trace activity's span id. An alternative approach is to add an additional field to the proto object corresponding to the clientSpanId, and just set the TraceParent as Activity.Id as you normally would. If anything, I'd say that this is truer to the original contract: other SDKs accessing this object would expect it hold the span id of the parent trace Activity, not an injected client span id.

All that being said, I do see that DT.Core builds a traceparent manually in an identical way so maybe it's not a cause for concern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where I got it from, so I figured it ought to work at least as well here and I was trying to minimize the changes needed to the protos. That said, I'm also not crazy about having to build the values by hand.

@philliphoff
Copy link
Member Author

Just out of curiosity: do we plan to do the same thing for entities at some point?

My first step was just to reach parity with DTFx but I think we'd definitely want to extended to entities/schedules.

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nit-picks as I go through the PR. I have not yet reviewed everything.

return null;
}

if (startEvent.ParentTraceContext is null || !ActivityContext.TryParse(startEvent.ParentTraceContext.TraceParent, startEvent.ParentTraceContext.TraceState, out ActivityContext activityContext))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you wrap these long lines? Per the .editorconfig file (which not all IDEs support natively, though VSCode.dev seems to), we try to keep line length to a maximum of 120 characters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've gotten all of these.

/// <remarks>
/// Adapted from "https://github.com/Azure/durabletask/blob/main/src/DurableTask.Core/Tracing/TraceHelper.cs".
/// </remarks>
class TraceHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make this static class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/// <summary>
/// Constants for trace activity names used in Durable Task Framework.
/// </summary>
class TraceActivityConstants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make this static class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

}
}

if (Activity.Current is not null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rather than checking separately for Activity.Current being null, could we just move this logic into the previous newActivity != null code block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense.

{
P.HistoryEvent? GetSuborchestrationInstanceCreatedEvent(int eventId)
{
var subOrchestrationEvent =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: per the existing conventions in this repo, please try to use explicit type names instead of var. It makes the code easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've updated all of these.

@philliphoff
Copy link
Member Author

@sophiatev Is there more feedback on this change?

@philliphoff philliphoff requested a review from sophiatev August 7, 2025 17:58
Copy link
Contributor

@sophiatev sophiatev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but just a few questions:

  1. How did we actually get this working? When I was trying to do add distributed tracing for entities I had to do a lot of work-arounds and add most of it to DT.Core because we seemingly couldn't add an Activity source to the dotnet SDK at the time. Did that change? (I wish I would have done my work after you did then :D. It would have improved my implementation by quite a bit)
  2. Does the user have to enable distributed tracing in their settings somehow to get this tracing?
  3. Do we want to make an Activity in the GrpcOrchestrationRunner after all?

@philliphoff
Copy link
Member Author

How did we actually get this working? When I was trying to do add distributed tracing for entities I had to do a lot of work-arounds and add most of it to DT.Core because we seemingly couldn't add an Activity source to the dotnet SDK at the time. Did that change?

@sophiatev I got it working/tested by creating an app that references a private build of the SDK and enables this new activity source (Microsoft.DurableTask), and then a backend that respects the protobuf changes that allow the context to be properly passed (changes coming to DTS once this goes in). I'm not sure what issues you ran into before, but AFAIK, any 3rd party can add activity sources.

Does the user have to enable distributed tracing in their settings somehow to get this tracing?

Yes, users collecting traces would have to add Microsoft.DurableTask as a source.

Do we want to make an Activity in the GrpcOrchestrationRunner after all?

Are you referring to the earlier comment about Durable Functions also having an activity for that method, or something else? If the former, I'd say yes; the DF traces are from a different source so we'd want to ensure it's logged when DT is used on its own.

@sophiatev With tentative approval here, we can now commit the corresponding protobuf changes. That will then mean a small update to this PR to bring them across the "proper way", which will need a re-approval.

@sophiatev
Copy link
Contributor

Yes, users collecting traces would have to add Microsoft.DurableTask as a source.

Perhaps a silly question, but have you tested without doing this to make sure that nothing breaks? I made the mistake of not doing so (in my case, not trying a run without distributed tracing enabled) which ended up in a null exception for customers. Want to make sure other's don't repeat my mistakes :D

Are you referring to the earlier comment about Durable Functions also having an activity for that method, or something else? If the former, I'd say yes; the DF traces are from a different source so we'd want to ensure it's logged when DT is used on its own.

Yes, I am. But it looks like the GrpcOrchestrationRunner is still not creating an Activity? Doesn't this imply that it should? (Although I'm not sure this is called when DT is used on its own, maybe there's a scenario where it will be in the future?)

@philliphoff
Copy link
Member Author

Perhaps a silly question, but have you tested without doing this to make sure that nothing breaks? I made the mistake of not doing so (in my case, not trying a run without distributed tracing enabled) which ended up in a null exception for customers. Want to make sure other's don't repeat my mistakes :D

@sophiatev That's a good test case; yes, I've verified that execution proceeds when tracing has not been enabled.

Yes, I am. But it looks like the GrpcOrchestrationRunner is still not creating an Activity? Doesn't this imply that it should? (Although I'm not sure this is called when DT is used on its own, maybe there's a scenario where it will be in the future?)

I verified that GrpcOrchestrationRunner (and GrpcEntityRunner) are not called within the SDK and only called by the DF extension (and no other external libraries as far as GH can find). At least for now, I'm inclined to leave tracing out of them, given the specific focus of this PR on DT, and it'd be a different source from whatever AF/DF was doing anyway, plus there doesn't seem to be an equivalent for activities so there would still be gaps in the DT traces? If and when it is needed to provide a coherent tracing story for DF, I'd be open to adding them, but I feel like that's a completely different set of tests and fixes.

@philliphoff philliphoff merged commit b3064cb into microsoft:main Aug 12, 2025
4 checks passed
@philliphoff philliphoff deleted the philliphoff-orchestration-tracing branch August 12, 2025 16:25
@philliphoff
Copy link
Member Author

@sophiatev Is there a regular schedule/process for new releases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tracing parity with Durable Task Framework (DTFx)

3 participants